Conversation
48aab51 to
06ed421
Compare
|
Closes #16798 |
|
|
|
I ran this PR with the q8_0 by DevQuasar and seems to be working. Without It does not print an initial Full command and perplexity results (looks fine) here: https://huggingface.co/DevQuasar/MiniMaxAI.MiniMax-M2-GGUF/discussions/1 |
CISC
left a comment
There was a problem hiding this comment.
Remove the vocab files and test, if there is a good reason to test the vocab (which AFAICT there is not) we can add it to ggml-org/vocabs on HF.
|
Tool calls don't work yet? Or is that just this particular GGUF (from bullerwins)? |
|
Done. |
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
|
Argh, stupid codespaces. @CISC rebased on current master, should be OK now. |
|
@pwilkin Fantastic work and thanks as always for your open source work! |
Is it worth merging if this does not work? |
I think the jinja template works if you just remove |
|
@CISC Yep I normally just remove it for now |
It's weird too, I don't understand why some are using it in their templates as it is default, makes no sense... |
CISC
left a comment
There was a problem hiding this comment.
Ready to merge when CIs are done.
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
|
@CISC look OK to me, the failures are unrelated (webgpu). |
Seems similar to gpt-oss in this regard, except for all messages and not just tool calls. It should work if clients pass back assistant messages with |
|
guys any idea why the thinking tag still didn't get fixed ? |
|
Once --reasoning-format none is set on the backend, everything should work as the reasoning content will be passed back to the server; the rest is purely cosmetic, like adding a lightweight, dedicated front-end filter or toggle to handle multiple <think>...</think> blocks gracefully. We could take a more modular approach: the backend could properly parse the blocks and send alternating delta reasoning_content / delta content, while a simple front-end option could resend “reasoning_content as content” with a configurable delimiter. It would fit nicely within the OpenAI-Compat layer: though it might be a bit of overengineering... but it would cover all possible cases without needing any additional parsing logic or frontend-side hacks. |
Adding --reasoning-format none still results in missing tink tags. |
|
OK : https://huggingface.co/MiniMaxAI/MiniMax-M2/blob/main/chat_template.jinja MiniMax-M2 is the first model that actually requires this behavior (the reasoning_content must be preserved in context), so it deserves its own special option. |
|
i dont know why they merged this pr impo this is not good. |
Not everything has to (or should) be done in a single PR. |
|
For anyone interested in enabling tool calls for Minimax M2, refer to PR #16932 — I’ve managed to get tool calls working. |
* Model: Minimax M2 * Cleanup * Cleanup pt. 2 * Cleanup pt. 3 * Update convert_hf_to_gguf_update.py - merge catch blocks Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Remove vocab models and test * Remove all redundant hparam settings covered by TextModel * Move super to start, don't set block_count * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update gguf-py/gguf/constants.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Model: Minimax M2 * Cleanup * Cleanup pt. 2 * Cleanup pt. 3 * Update convert_hf_to_gguf_update.py - merge catch blocks Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Remove vocab models and test * Remove all redundant hparam settings covered by TextModel * Move super to start, don't set block_count * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update gguf-py/gguf/constants.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>




Implementation for Minimax M2 - not doing the chat template yet because not sure how to handle the interleaving thinking blocks.